Overview

Dataset statistics

Number of variables25
Number of observations2823
Missing cells5157
Missing cells (%)7.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory551.5 KiB
Average record size in memory200.0 B

Variable types

CAT18
NUM7

Warnings

ORDERDATE has a high cardinality: 252 distinct values High cardinality
PRODUCTCODE has a high cardinality: 109 distinct values High cardinality
CUSTOMERNAME has a high cardinality: 92 distinct values High cardinality
PHONE has a high cardinality: 91 distinct values High cardinality
ADDRESSLINE1 has a high cardinality: 92 distinct values High cardinality
CITY has a high cardinality: 73 distinct values High cardinality
POSTALCODE has a high cardinality: 73 distinct values High cardinality
CONTACTLASTNAME has a high cardinality: 77 distinct values High cardinality
CONTACTFIRSTNAME has a high cardinality: 72 distinct values High cardinality
MONTH_ID is highly correlated with QTR_IDHigh correlation
QTR_ID is highly correlated with MONTH_IDHigh correlation
YEAR_ID is highly correlated with ORDERNUMBERHigh correlation
ORDERNUMBER is highly correlated with YEAR_IDHigh correlation
PHONE is highly correlated with CUSTOMERNAME and 9 other fieldsHigh correlation
CUSTOMERNAME is highly correlated with PHONE and 9 other fieldsHigh correlation
ADDRESSLINE1 is highly correlated with CUSTOMERNAME and 9 other fieldsHigh correlation
ADDRESSLINE2 is highly correlated with CUSTOMERNAME and 9 other fieldsHigh correlation
CITY is highly correlated with CUSTOMERNAME and 8 other fieldsHigh correlation
STATE is highly correlated with CUSTOMERNAME and 7 other fieldsHigh correlation
POSTALCODE is highly correlated with CUSTOMERNAME and 9 other fieldsHigh correlation
COUNTRY is highly correlated with CUSTOMERNAME and 9 other fieldsHigh correlation
TERRITORY is highly correlated with CUSTOMERNAME and 9 other fieldsHigh correlation
CONTACTLASTNAME is highly correlated with CUSTOMERNAME and 8 other fieldsHigh correlation
CONTACTFIRSTNAME is highly correlated with CUSTOMERNAME and 7 other fieldsHigh correlation
ADDRESSLINE2 has 2521 (89.3%) missing values Missing
STATE has 1486 (52.6%) missing values Missing
POSTALCODE has 76 (2.7%) missing values Missing
TERRITORY has 1074 (38.0%) missing values Missing
PRODUCTCODE is uniformly distributed Uniform

Reproduction

Analysis started2020-12-12 09:54:39.516945
Analysis finished2020-12-12 09:56:35.664690
Duration1 minute and 56.15 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

ORDERNUMBER
Real number (ℝ≥0)

HIGH CORRELATION

Distinct307
Distinct (%)10.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10258.72512
Minimum10100
Maximum10425
Zeros0
Zeros (%)0.0%
Memory size22.1 KiB
2020-12-12T15:26:36.319261image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum10100
5-th percentile10115
Q110180
median10262
Q310333.5
95-th percentile10405
Maximum10425
Range325
Interquartile range (IQR)153.5

Descriptive statistics

Standard deviation92.0854776
Coefficient of variation (CV)0.008976308124
Kurtosis-1.173309247
Mean10258.72512
Median Absolute Deviation (MAD)79
Skewness0.01382298874
Sum28960381
Variance8479.735184
MonotocityNot monotonic
2020-12-12T15:26:37.077019image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
10332180.6%
 
10386180.6%
 
10165180.6%
 
10159180.6%
 
10168180.6%
 
10275180.6%
 
10222180.6%
 
10398180.6%
 
10106180.6%
 
10316180.6%
 
Other values (297)264393.6%
 
ValueCountFrequency (%) 
1010040.1%
 
1010140.1%
 
1010220.1%
 
10103160.6%
 
10104130.5%
 
ValueCountFrequency (%) 
10425130.5%
 
1042460.2%
 
1042350.2%
 
1042220.1%
 
1042120.1%
 

QUANTITYORDERED
Real number (ℝ≥0)

Distinct58
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.09280907
Minimum6
Maximum97
Zeros0
Zeros (%)0.0%
Memory size22.1 KiB
2020-12-12T15:26:38.109952image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum6
5-th percentile21
Q127
median35
Q343
95-th percentile49
Maximum97
Range91
Interquartile range (IQR)16

Descriptive statistics

Standard deviation9.741442737
Coefficient of variation (CV)0.2775908511
Kurtosis0.4157437898
Mean35.09280907
Median Absolute Deviation (MAD)8
Skewness0.3625853288
Sum99067
Variance94.8957066
MonotocityNot monotonic
2020-12-12T15:26:38.933312image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
341124.0%
 
211033.6%
 
461013.6%
 
271003.5%
 
45973.4%
 
41973.4%
 
31973.4%
 
26963.4%
 
48943.3%
 
25943.3%
 
Other values (48)183264.9%
 
ValueCountFrequency (%) 
620.1%
 
1020.1%
 
1120.1%
 
121< 0.1%
 
131< 0.1%
 
ValueCountFrequency (%) 
971< 0.1%
 
851< 0.1%
 
771< 0.1%
 
7630.1%
 
7020.1%
 

PRICEEACH
Real number (ℝ≥0)

Distinct1016
Distinct (%)36.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean83.6585441
Minimum26.88
Maximum100
Zeros0
Zeros (%)0.0%
Memory size22.1 KiB
2020-12-12T15:26:39.618501image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum26.88
5-th percentile42.67
Q168.86
median95.7
Q3100
95-th percentile100
Maximum100
Range73.12
Interquartile range (IQR)31.14

Descriptive statistics

Standard deviation20.17427653
Coefficient of variation (CV)0.2411502225
Kurtosis-0.374817693
Mean83.6585441
Median Absolute Deviation (MAD)4.3
Skewness-0.946648859
Sum236168.07
Variance407.0014334
MonotocityNot monotonic
2020-12-12T15:26:40.564407image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
100130446.2%
 
96.3460.2%
 
59.8760.2%
 
67.1450.2%
 
51.9350.2%
 
80.5550.2%
 
57.7350.2%
 
61.9950.2%
 
90.1750.2%
 
89.3850.2%
 
Other values (1006)147252.1%
 
ValueCountFrequency (%) 
26.881< 0.1%
 
27.221< 0.1%
 
28.291< 0.1%
 
28.881< 0.1%
 
29.2120.1%
 
ValueCountFrequency (%) 
100130446.2%
 
99.911< 0.1%
 
99.8220.1%
 
99.721< 0.1%
 
99.691< 0.1%
 

ORDERLINENUMBER
Real number (ℝ≥0)

Distinct18
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.46617074
Minimum1
Maximum18
Zeros0
Zeros (%)0.0%
Memory size22.1 KiB
2020-12-12T15:26:41.382579image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q39
95-th percentile14
Maximum18
Range17
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.225840965
Coefficient of variation (CV)0.6535306806
Kurtosis-0.5611542428
Mean6.46617074
Median Absolute Deviation (MAD)3
Skewness0.5907412107
Sum18254
Variance17.85773186
MonotocityNot monotonic
2020-12-12T15:26:42.668223image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%) 
130710.9%
 
229110.3%
 
32709.6%
 
42569.1%
 
52398.5%
 
62217.8%
 
71977.0%
 
81876.6%
 
91655.8%
 
101415.0%
 
Other values (8)54919.4%
 
ValueCountFrequency (%) 
130710.9%
 
229110.3%
 
32709.6%
 
42569.1%
 
52398.5%
 
ValueCountFrequency (%) 
18100.4%
 
17250.9%
 
16421.5%
 
15562.0%
 
14812.9%
 

SALES
Real number (ℝ≥0)

Distinct2763
Distinct (%)97.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3553.889072
Minimum482.13
Maximum14082.8
Zeros0
Zeros (%)0.0%
Memory size22.1 KiB
2020-12-12T15:26:43.568308image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum482.13
5-th percentile1268.757
Q12203.43
median3184.8
Q34508
95-th percentile7108.12
Maximum14082.8
Range13600.67
Interquartile range (IQR)2304.57

Descriptive statistics

Standard deviation1841.865106
Coefficient of variation (CV)0.5182674722
Kurtosis1.792676469
Mean3553.889072
Median Absolute Deviation (MAD)1102.31
Skewness1.161076001
Sum10032628.85
Variance3392467.068
MonotocityNot monotonic
2020-12-12T15:26:44.596619image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
300330.1%
 
1666.720.1%
 
5984.1420.1%
 
1030.4420.1%
 
2935.1520.1%
 
2795.2720.1%
 
146320.1%
 
1742.420.1%
 
2441.0420.1%
 
2620.820.1%
 
Other values (2753)280299.3%
 
ValueCountFrequency (%) 
482.131< 0.1%
 
541.141< 0.1%
 
553.951< 0.1%
 
577.61< 0.1%
 
640.051< 0.1%
 
ValueCountFrequency (%) 
14082.81< 0.1%
 
12536.51< 0.1%
 
120011< 0.1%
 
11887.81< 0.1%
 
11886.61< 0.1%
 

ORDERDATE
Categorical

HIGH CARDINALITY

Distinct252
Distinct (%)8.9%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
11/14/2003 0:00
 
38
11/24/2004 0:00
 
35
11/12/2003 0:00
 
34
11/17/2004 0:00
 
32
11/4/2004 0:00
 
29
Other values (247)
2655 
ValueCountFrequency (%) 
11/14/2003 0:00381.3%
 
11/24/2004 0:00351.2%
 
11/12/2003 0:00341.2%
 
11/17/2004 0:00321.1%
 
11/4/2004 0:00291.0%
 
10/16/2004 0:00281.0%
 
12/2/2003 0:00281.0%
 
11/5/2003 0:00281.0%
 
11/6/2003 0:00271.0%
 
8/20/2004 0:00271.0%
 
Other values (242)251789.2%
 
2020-12-12T15:26:45.793020image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique9 ?
Unique (%)0.3%
2020-12-12T15:26:47.302744image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length15
Median length14
Mean length14.04463337
Min length13

STATUS
Categorical

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
Shipped
2617 
Cancelled
 
60
Resolved
 
47
On Hold
 
44
In Process
 
41
ValueCountFrequency (%) 
Shipped261792.7%
 
Cancelled602.1%
 
Resolved471.7%
 
On Hold441.6%
 
In Process411.5%
 
Disputed140.5%
 
2020-12-12T15:26:47.992880image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:26:48.291979image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:48.795440image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length7
Mean length7.107686858
Min length7

QTR_ID
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
4
1094 
1
665 
2
561 
3
503 
ValueCountFrequency (%) 
4109438.8%
 
166523.6%
 
256119.9%
 
350317.8%
 
2020-12-12T15:26:49.406860image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:26:49.813790image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:50.319250image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

MONTH_ID
Real number (ℝ≥0)

HIGH CORRELATION

Distinct12
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.092454835
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size22.1 KiB
2020-12-12T15:26:50.861647image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q14
median8
Q311
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)7

Descriptive statistics

Standard deviation3.656633308
Coefficient of variation (CV)0.515566668
Kurtosis-1.38327478
Mean7.092454835
Median Absolute Deviation (MAD)3
Skewness-0.2729015635
Sum20022
Variance13.37096715
MonotocityNot monotonic
2020-12-12T15:26:51.530357image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%) 
1159721.1%
 
1031711.2%
 
52528.9%
 
12298.1%
 
22247.9%
 
32127.5%
 
81916.8%
 
121806.4%
 
41786.3%
 
91716.1%
 
Other values (2)2729.6%
 
ValueCountFrequency (%) 
12298.1%
 
22247.9%
 
32127.5%
 
41786.3%
 
52528.9%
 
ValueCountFrequency (%) 
121806.4%
 
1159721.1%
 
1031711.2%
 
91716.1%
 
81916.8%
 

YEAR_ID
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
2004
1345 
2003
1000 
2005
478 
ValueCountFrequency (%) 
2004134547.6%
 
2003100035.4%
 
200547816.9%
 
2020-12-12T15:26:52.283724image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:26:52.933267image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:54.202545image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length4
Median length4
Mean length4
Min length4

PRODUCTLINE
Categorical

Distinct7
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
Classic Cars
967 
Vintage Cars
607 
Motorcycles
331 
Planes
306 
Trucks and Buses
301 
Other values (2)
311 
ValueCountFrequency (%) 
Classic Cars96734.3%
 
Vintage Cars60721.5%
 
Motorcycles33111.7%
 
Planes30610.8%
 
Trucks and Buses30110.7%
 
Ships2348.3%
 
Trains772.7%
 
2020-12-12T15:26:55.653042image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:26:56.238477image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:57.164838image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length16
Median length12
Mean length10.91498406
Min length5

MSRP
Real number (ℝ≥0)

Distinct80
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean100.7155508
Minimum33
Maximum214
Zeros0
Zeros (%)0.0%
Memory size22.1 KiB
2020-12-12T15:26:58.255415image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum33
5-th percentile43
Q168
median99
Q3124
95-th percentile170
Maximum214
Range181
Interquartile range (IQR)56

Descriptive statistics

Standard deviation40.18791168
Coefficient of variation (CV)0.3990238979
Kurtosis-0.1318145207
Mean100.7155508
Median Absolute Deviation (MAD)28
Skewness0.5801750539
Sum284320
Variance1615.068245
MonotocityNot monotonic
2020-12-12T15:26:59.429221image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1181043.7%
 
991033.6%
 
136802.8%
 
62782.8%
 
68772.7%
 
60762.7%
 
80732.6%
 
101541.9%
 
115541.9%
 
54541.9%
 
Other values (70)207073.3%
 
ValueCountFrequency (%) 
33250.9%
 
35281.0%
 
37271.0%
 
40250.9%
 
41220.8%
 
ValueCountFrequency (%) 
214281.0%
 
207260.9%
 
194250.9%
 
193260.9%
 
173260.9%
 

PRODUCTCODE
Categorical

HIGH CARDINALITY
UNIFORM

Distinct109
Distinct (%)3.9%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
S18_3232
 
52
S32_2509
 
28
S10_4962
 
28
S18_1097
 
28
S50_1392
 
28
Other values (104)
2659 
ValueCountFrequency (%) 
S18_3232521.8%
 
S32_2509281.0%
 
S10_4962281.0%
 
S18_1097281.0%
 
S50_1392281.0%
 
S18_2432281.0%
 
S12_1666281.0%
 
S24_2840281.0%
 
S24_1444281.0%
 
S10_1949281.0%
 
Other values (99)251989.2%
 
2020-12-12T15:27:00.227338image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:27:01.056838image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length9
Median length8
Mean length8.110874956
Min length8

CUSTOMERNAME
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct92
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
Euro Shopping Channel
259 
Mini Gifts Distributors Ltd.
 
180
Australian Collectors, Co.
 
55
La Rochelle Gifts
 
53
AV Stores, Co.
 
51
Other values (87)
2225 
ValueCountFrequency (%) 
Euro Shopping Channel2599.2%
 
Mini Gifts Distributors Ltd.1806.4%
 
Australian Collectors, Co.551.9%
 
La Rochelle Gifts531.9%
 
AV Stores, Co.511.8%
 
Land of Toys Inc.491.7%
 
Rovelli Gifts481.7%
 
Muscle Machine Inc481.7%
 
Anna's Decorations, Ltd461.6%
 
Souveniers And Things Co.461.6%
 
Other values (82)198870.4%
 
2020-12-12T15:27:02.028996image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:27:03.206717image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length34
Median length21
Mean length20.97272405
Min length10

PHONE
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct91
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
(91) 555 94 44
259 
4155551450
 
180
03 9520 4555
 
55
40.67.8555
 
53
(171) 555-1555
 
51
Other values (86)
2225 
ValueCountFrequency (%) 
(91) 555 94 442599.2%
 
41555514501806.4%
 
03 9520 4555551.9%
 
40.67.8555531.9%
 
(171) 555-1555511.8%
 
6175558555511.8%
 
2125557818491.7%
 
035-640555481.7%
 
2125557413481.7%
 
+61 2 9495 8555461.6%
 
Other values (81)198370.2%
 
2020-12-12T15:27:04.464980image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:27:06.113931image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length17
Median length10
Mean length11.63655685
Min length9

ADDRESSLINE1
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct92
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
C/ Moralzarzal, 86
259 
5677 Strong St.
 
180
636 St Kilda Road
 
55
67, rue des Cinquante Otages
 
53
Fauntleroy Circus
 
51
Other values (87)
2225 
ValueCountFrequency (%) 
C/ Moralzarzal, 862599.2%
 
5677 Strong St.1806.4%
 
636 St Kilda Road551.9%
 
67, rue des Cinquante Otages531.9%
 
Fauntleroy Circus511.8%
 
897 Long Airport Avenue491.7%
 
4092 Furth Circle481.7%
 
Via Ludovico il Moro 22481.7%
 
201 Miller Street461.6%
 
Monitor Money Building, 815 Pacific Hwy461.6%
 
Other values (82)198870.4%
 
2020-12-12T15:27:07.590614image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:27:08.357968image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length42
Median length18
Mean length19.44597945
Min length11

ADDRESSLINE2
Categorical

HIGH CORRELATION
MISSING

Distinct9
Distinct (%)3.0%
Missing2521
Missing (%)89.3%
Memory size22.1 KiB
Level 3
55 
Suite 400
48 
Level 15
46 
Level 6
46 
2nd Floor
36 
Other values (4)
71 
ValueCountFrequency (%) 
Level 3551.9%
 
Suite 400481.7%
 
Level 15461.6%
 
Level 6461.6%
 
2nd Floor361.3%
 
Suite 101250.9%
 
Suite 750200.7%
 
Floor No. 4160.6%
 
Suite 200100.4%
 
(Missing)252189.3%
 
2020-12-12T15:27:09.793728image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:27:10.542325image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:27:11.767658image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length11
Median length3
Mean length3.565356004
Min length3

CITY
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct73
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
Madrid
304 
San Rafael
 
180
NYC
 
152
Singapore
 
79
Paris
 
70
Other values (68)
2038 
ValueCountFrequency (%) 
Madrid30410.8%
 
San Rafael1806.4%
 
NYC1525.4%
 
Singapore792.8%
 
Paris702.5%
 
San Francisco622.2%
 
New Bedford612.2%
 
Nantes602.1%
 
Melbourne551.9%
 
Manchester511.8%
 
Other values (63)174962.0%
 
2020-12-12T15:27:12.679744image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:27:13.450644image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length14
Median length8
Mean length7.753099539
Min length3

STATE
Categorical

HIGH CORRELATION
MISSING

Distinct16
Distinct (%)1.2%
Missing1486
Missing (%)52.6%
Memory size22.1 KiB
CA
416 
MA
190 
NY
178 
NSW
92 
Victoria
78 
Other values (11)
383 
ValueCountFrequency (%) 
CA41614.7%
 
MA1906.7%
 
NY1786.3%
 
NSW923.3%
 
Victoria782.8%
 
PA752.7%
 
CT612.2%
 
BC481.7%
 
NH341.2%
 
Tokyo321.1%
 
Other values (6)1334.7%
 
(Missing)148652.6%
 
2020-12-12T15:27:14.395053image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:27:15.225250image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length13
Median length3
Mean length2.955012398
Min length2

POSTALCODE
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct73
Distinct (%)2.7%
Missing76
Missing (%)2.7%
Memory size22.1 KiB
28034
259 
97562
205 
10022
 
152
94217
 
89
50553
 
61
Other values (68)
1981 
ValueCountFrequency (%) 
280342599.2%
 
975622057.3%
 
100221525.4%
 
94217893.2%
 
50553612.2%
 
44000602.1%
 
3004551.9%
 
EC2 5NT511.8%
 
24100481.7%
 
58339471.7%
 
Other values (63)172060.9%
 
(Missing)762.7%
 
2020-12-12T15:27:15.920810image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:27:16.660559image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length9
Median length5
Mean length5.153737159
Min length1

COUNTRY
Categorical

HIGH CORRELATION

Distinct19
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
USA
1004 
Spain
342 
France
314 
Australia
185 
UK
144 
Other values (14)
834 
ValueCountFrequency (%) 
USA100435.6%
 
Spain34212.1%
 
France31411.1%
 
Australia1856.6%
 
UK1445.1%
 
Italy1134.0%
 
Finland923.3%
 
Norway853.0%
 
Singapore792.8%
 
Canada702.5%
 
Other values (9)39514.0%
 
2020-12-12T15:27:17.300623image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:27:17.809166image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length11
Median length5
Mean length5.044633369
Min length2

TERRITORY
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)0.2%
Missing1074
Missing (%)38.0%
Memory size22.1 KiB
EMEA
1407 
APAC
221 
Japan
 
121
ValueCountFrequency (%) 
EMEA140749.8%
 
APAC2217.8%
 
Japan1214.3%
 
(Missing)107438.0%
 
2020-12-12T15:27:18.472872image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:27:18.877029image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:27:19.456628image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length4
Mean length3.66241587
Min length3

CONTACTLASTNAME
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct77
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
Freyre
259 
Nelson
 
204
Young
 
115
Frick
 
91
Brown
 
88
Other values (72)
2066 
ValueCountFrequency (%) 
Freyre2599.2%
 
Nelson2047.2%
 
Young1154.1%
 
Frick913.2%
 
Brown883.1%
 
Yu802.8%
 
Hernandez702.5%
 
Ferguson551.9%
 
King541.9%
 
Labrune531.9%
 
Other values (67)175462.1%
 
2020-12-12T15:27:20.310399image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:27:20.870493image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length11
Median length6
Mean length6.441374424
Min length2

CONTACTFIRSTNAME
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct72
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
Diego
259 
Valarie
257 
Julie
 
117
Michael
 
84
Sue
 
84
Other values (67)
2022 
ValueCountFrequency (%) 
Diego2599.2%
 
Valarie2579.1%
 
Julie1174.1%
 
Michael843.0%
 
Sue843.0%
 
Juri602.1%
 
Maria582.1%
 
Elizabeth551.9%
 
Peter551.9%
 
Janine531.9%
 
Other values (62)174161.7%
 
2020-12-12T15:27:21.681499image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:27:22.287122image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length5
Mean length5.668083599
Min length3

DEALSIZE
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size22.1 KiB
Medium
1384 
Small
1282 
Large
157 
ValueCountFrequency (%) 
Medium138449.0%
 
Small128245.4%
 
Large1575.6%
 
2020-12-12T15:27:22.828227image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T15:27:23.120272image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:27:23.497056image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length5
Mean length5.49025859
Min length5

Interactions

2020-12-12T15:25:58.162376image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:25:58.791416image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:25:59.413349image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:00.260772image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:00.872714image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:01.410513image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:01.888425image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:02.289871image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:02.683186image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:03.136959image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:03.553466image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:04.068269image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:04.745660image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:05.348532image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:05.943743image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:06.515726image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:07.028289image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:07.440898image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:07.886946image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:08.296275image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:08.662603image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:09.064073image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:09.566530image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:10.033043image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:10.404150image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:10.784409image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:11.176603image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:11.542344image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:12.001568image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:12.477673image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:12.889331image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:13.273667image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:13.647954image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:14.028119image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:15.005283image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:15.637123image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:16.034864image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:16.768874image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:17.503050image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:18.287516image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:18.906415image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:19.989795image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:20.753654image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:21.679612image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:22.269023image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:23.160445image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:23.890169image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:25.170649image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:25.994370image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Correlations

2020-12-12T15:27:24.085768image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-12-12T15:27:24.695942image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-12-12T15:27:25.297881image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-12-12T15:27:26.166295image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-12-12T15:27:27.076719image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-12-12T15:26:27.990113image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:30.737476image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:32.735974image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T15:26:34.137324image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Sample

First rows

ORDERNUMBERQUANTITYORDEREDPRICEEACHORDERLINENUMBERSALESORDERDATESTATUSQTR_IDMONTH_IDYEAR_IDPRODUCTLINEMSRPPRODUCTCODECUSTOMERNAMEPHONEADDRESSLINE1ADDRESSLINE2CITYSTATEPOSTALCODECOUNTRYTERRITORYCONTACTLASTNAMECONTACTFIRSTNAMEDEALSIZE
0101073095.7022871.002/24/2003 0:00Shipped122003Motorcycles95S10_1678Land of Toys Inc.2125557818897 Long Airport AvenueNaNNYCNY10022USANaNYuKwaiSmall
1101213481.3552765.905/7/2003 0:00Shipped252003Motorcycles95S10_1678Reims Collectables26.47.155559 rue de l'AbbayeNaNReimsNaN51100FranceEMEAHenriotPaulSmall
2101344194.7423884.347/1/2003 0:00Shipped372003Motorcycles95S10_1678Lyon Souveniers+33 1 46 62 755527 rue du Colonel Pierre AviaNaNParisNaN75508FranceEMEADa CunhaDanielMedium
3101454583.2663746.708/25/2003 0:00Shipped382003Motorcycles95S10_1678Toys4GrownUps.com626555726578934 Hillside Dr.NaNPasadenaCA90003USANaNYoungJulieMedium
41015949100.00145205.2710/10/2003 0:00Shipped4102003Motorcycles95S10_1678Corporate Gift Ideas Co.65055513867734 Strong St.NaNSan FranciscoCANaNUSANaNBrownJulieMedium
5101683696.6613479.7610/28/2003 0:00Shipped4102003Motorcycles95S10_1678Technics Stores Inc.65055568099408 Furth CircleNaNBurlingameCA94217USANaNHiranoJuriMedium
6101802986.1392497.7711/11/2003 0:00Shipped4112003Motorcycles95S10_1678Daedalus Designs Imports20.16.1555184, chausse de TournaiNaNLilleNaN59000FranceEMEARanceMartineSmall
71018848100.0015512.3211/18/2003 0:00Shipped4112003Motorcycles95S10_1678Herkku Gifts+47 2267 3215Drammen 121, PR 744 SentrumNaNBergenNaNN 5804NorwayEMEAOeztanVeyselMedium
8102012298.5722168.5412/1/2003 0:00Shipped4122003Motorcycles95S10_1678Mini Wheels Co.65055557875557 North Pendale StreetNaNSan FranciscoCANaNUSANaNMurphyJulieSmall
91021141100.00144708.441/15/2004 0:00Shipped112004Motorcycles95S10_1678Auto Canal Petit(1) 47.55.655525, rue LauristonNaNParisNaN75016FranceEMEAPerrierDominiqueMedium

Last rows

ORDERNUMBERQUANTITYORDEREDPRICEEACHORDERLINENUMBERSALESORDERDATESTATUSQTR_IDMONTH_IDYEAR_IDPRODUCTLINEMSRPPRODUCTCODECUSTOMERNAMEPHONEADDRESSLINE1ADDRESSLINE2CITYSTATEPOSTALCODECOUNTRYTERRITORYCONTACTLASTNAMECONTACTFIRSTNAMEDEALSIZE
2813102933260.0611921.929/9/2004 0:00Shipped392004Ships54S72_3212Amica Models & Co.011-4988555Via Monte Bianco 34NaNTorinoNaN10100ItalyEMEAAccortiPaoloSmall
2814103063559.5162082.8510/14/2004 0:00Shipped4102004Ships54S72_3212AV Stores, Co.(171) 555-1555Fauntleroy CircusNaNManchesterNaNEC2 5NTUKEMEAAshworthVictoriaSmall
2815103154055.6952227.6010/29/2004 0:00Shipped4102004Ships54S72_3212La Rochelle Gifts40.67.855567, rue des Cinquante OtagesNaNNantesNaN44000FranceEMEALabruneJanineSmall
2816103273786.7443209.3811/10/2004 0:00Resolved4112004Ships54S72_3212Danish Wholesale Imports31 12 3555Vinb'ltet 34NaNKobenhavnNaN1734DenmarkEMEAPetersenJytteMedium
2817103374297.1654080.7211/21/2004 0:00Shipped4112004Ships54S72_3212Classic Legends Inc.21255584935905 Pompton St.Suite 750NYCNY10022USANaNHernandezMariaMedium
28181035020100.00152244.4012/2/2004 0:00Shipped4122004Ships54S72_3212Euro Shopping Channel(91) 555 94 44C/ Moralzarzal, 86NaNMadridNaN28034SpainEMEAFreyreDiegoSmall
28191037329100.0013978.511/31/2005 0:00Shipped112005Ships54S72_3212Oulu Toy Supplies, Inc.981-443655Torikatu 38NaNOuluNaN90110FinlandEMEAKoskitaloPirkkoMedium
28201038643100.0045417.573/1/2005 0:00Resolved132005Ships54S72_3212Euro Shopping Channel(91) 555 94 44C/ Moralzarzal, 86NaNMadridNaN28034SpainEMEAFreyreDiegoMedium
2821103973462.2412116.163/28/2005 0:00Shipped132005Ships54S72_3212Alpha Cognac61.77.65551 rue Alsace-LorraineNaNToulouseNaN31000FranceEMEARouletAnnetteSmall
2822104144765.5293079.445/6/2005 0:00On Hold252005Ships54S72_3212Gifts4AllAges.com61755595558616 Spinnaker Dr.NaNBostonMA51003USANaNYoshidoJuriMedium